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Abstract 

We consider a size-structured population describing the cell divisions. The cell 
population is described by an empirical measure and we observe the divisions in the 
continuous time interval [0, T], We address here the problem of estimating the division 
kernel h (or fragmentation kernel) in case of complete data. An adaptive estimator 
of h is constructed based on a kernel function K with a fully data-driven bandwidth 
selection method. We obtain an oracle inequality and an exponential convergence 
rate, for which optimality is considered. 

Keywords: random size-structured population, division kernel, nonparametric estima¬ 
tion, Goldenshluger-Lepski’s method, adaptive estimator, penalization, optimal rate. 


1 Introduction 

Models for populations of dividing cells possibly differentiated by covariates such as size 
have made the subject of an abundant literature in recent years (starting from Athreya 
and Ney [3], Harris [18], Jagers |2H|---) Covariates termed as ‘size’ are variables that grow 
deterministically with time (such as volume, length, level of certain proteins, DNA content, 
etc.) Such models of structured populations provide descriptions for the evolution of the 
size distribution, which can be interesting for applications. For instance, in the spirit of 
Stewart et al. |33j . we can imagine that each cell contains some toxicities whose quantity 
plays the role of the size. The asymmetric divisions of the cells, where one daughter 
contains more toxicity than the other, can lead under some conditions to the purge of the 
toxicity in the population by concentrating it into few lineages. These results are linked 
with the concept of aging for cell lineage. This concept has been tackled in many papers 
(e.g. Ackermann et al. El, Aguilaniu et al. [H, C-Y. Lai et al. 1221 . Evans and Steinsaltz 
US], Moseley [27|...). 

Here we consider a stochastic individual-based model of size-structured population in 
continuous time, where individuals are cells undergoing asymmetric binary divisions and 
whose size is the quantity of toxicity they contain. A cell containing a toxicity x G Re¬ 
divides at a rate i? > 0. The toxicity grows inside the cell with rate a > 0. When a 
cell divides, a random fraction T G [0,1] of the toxicity goes in the first daughter cell and 
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Figure 1: Trajectories of two daughter cells after a division, separating after the first division at 
time ti. 

1 — r in the second one. If F = the daughters are the same with toxicity We assume 
that F has a symmetric distribution on [0,1] with a density h with respect to Lebesgue 
measure such that P(F = 0) = P(F = 1) = 0. If /i is piked at 1/2 {i.e. F ~ 1/2), then both 
daughters contain the same toxicity, i.e. the half of their mother’s toxicity. The more h 
puts weight in the neighbourhood of 0 and 1, the more asymmetric the divisions are, with 
one daughter having little toxicity and the other a toxicity close to its mother’s one. If 
we consider that having a lot of toxicity is a kind of senescence, then, the kurtosis of h 
provides indication on aging phenomena (see m)- 

Modifications of this model to account for more complex phenomena have been consid¬ 
ered in other papers. Bansaye and Tran [6], Cloez m or Tran [35] consider non-constant 
division and growth rates. Robert et al. [29] studies whether divisions can occur only 
when a size threshold is reached. Our purpose here is to estimate the density h ruling the 
divisions, and we stick to constant rates R and a for the sake of simplicity. Notice that 
several similar models for binary cell division in discrete time also exist in the literature 
and have motivated statistical question as here, see for instance Bansaye et al. mE], 
Bercu et al. [8], Bitseki Penda m, Delmas and Marsalle |12j or Guyon m- 

Individual-based models provide a natural framework for statistical estimation. Es¬ 
timation of the division rate is, for instance, the subject of Doumic et al. [nun] and 
Hoffmann and Olivier m- Here, the density h is the kernel division that we want to esti¬ 
mate. Assuming that we observe the divisions of cells in continuous time on the interval 
[0, T], with T > 0, we propose an adaptive kernel estimator h of h for which we obtain 
an oracle inequality in Theorem The construction of h is detailed in the sequel. From 
oracle inequality we can infer adaptive exponential rates of convergence with respect to 
T depending on /3 the smoothness of the density. Most of the time, nonparametric rates 
are of the form n ^/s+i (gee for instance Tsybakov [36]) and exponential rates are not 
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often encountered in the literature. The exponential rates are due to binary splitting, 
the number of cells i.e the sample size increases exponentially in exp(i2T) (see Section 
2.3). By comparison, in [TH] Hoffmann and Olivier obtain a similar rate of convergence 
exp of the kernel estimator of their division rate B{x), where As is the 

Malthus parameter and ? > 0 is the smoothness of B{x). However, their estimator Bt 
of B is not adaptive since the choice of their optimal bandwidth still depends on Our 
estimator is adaptive with an “optimal” bandwidth chosen from a data-driven method. 
We derive upper bounds and lower bounds for asymptotic minimax risks on Holder classes 
and show that they coincide. Hence, the rate of convergence of our estimator h proves to 
be optimal in the minimax sense on the Holder classes. 

This paper is organized as follows. In Section 2, we introduce a stochastic differential 
equation driven by a Poisson point measure to describe the population of cells. Then, 
we construct the estimator of h and obtain upper and lower bounds for the MISE (Mean 
Integrated Squared Error). Our main results are stated in TheoremsandNumerical 
results and discussions about aging effect are presented in Section 3. The main proofs are 
shown in Section 4. 

Notation We introduce some notations used in the sequel. 

Hereafter, || • ||i and || • ||2 denote the and norms on M with respect to Lebesgue 
measure: 

ii/iii= [ \m\d7, \\f\\2 = ( [ 

Jm \Jr / 

The norm is defined by 

ll/lloo = sup |/( 7 )|. 

76(0,1) 

Einally, f g denotes the convolution of two functions / and g defined by 

f*g{l)= / f{u)g{'y - u)du. 

Jm 

2 Microscopic model and kernel estimator of h 

2.1 The model 

We recall the Ulam-Harris-Neveu notation used to describe the genealogical tree. The first 
cell is labelled by 0 and when the cell i divides, the two descendants are labelled by iO and 
il. The set of labels is 

oo 

j = {0}u u {o.ir- (1) 

m=l 

We denote Vt the set of cells alive at time t, and V) C J. 

Let A4 _f(M+) be the space of finite measures on M+ embedded with the topology of 
weak convergence and XI be the quantity of toxicity in the cell i at time t, we describe 
the population of cells at time t by a random point measure in AI_p(M+): 

Zt{dx) = Sxiidx), where W = {Zt,l) = / Zt{dx) (2) 

“T J^+ 
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is the number of individuals living at time t. For a measure fj, G and a positive 

function /, we use the notation (/r, /) = /d/r. 

Along branches of the genealogical tree, the toxicity {Xt,t > 0) satisfies 

dXt = adt, (3) 


with Xq = xq. When the cells divide, the toxicity is shared between the daughter cells. 
This is described by the following stochastic differential equation (SDE). 

Let Zq G AfF(M+) be an initial condition such that 


E((Zo, 1)) < +00, 


(4) 


and let Q{ds, di, d'y) be a Poisson point measure on M+ x£ := M_|_ x J x [0,1] with intensity 
q{ds,di,d'y) = R ds n{di)H(d^). n{di) is the counting measure on J and ds is Lebesgue 
measure on ]R_|_. We denote {Xt}t>o the canonical filtration associated with the Poisson 
point measure and the initial condition. The stochastic process {Zt)t>o can be described 
by a SDE as follows. 

Definition 1. For every test function ft{x) = f{x,t) G x M_|_,]R) (hounded of 

class in t and x with bounded derivatives), the population of cells is described by: 


{Zt,ft) = {Zo,fo)+[ [ {dsfs{x) + adxfs{x))Zs{dx)ds 
Jo J M_i_ 


+ / / 

/o Js 


fs ( 7 ^:_)+ fs ((1 - 7 )^:-) - fs (^:-) ] Q{ds,di,d^). ( 5 ) 


The second term in the right hand side of ([^ corresponds to the growth of toxicities 
in the cells and the third term gives a description of cell divisions where the sharing of 
toxicity into two daughter cells depends on the random fraction T. 

We now state some properties of Nt that are useful in the sequel. 

Proposition 1. Let T > 0, and assume the initial condition Nq, the number of mother 
cells at time t = 0, is deterministic, for the sake of simplicity. We have 


i) Let Tj be the jump time. Then: 


lim Tj = +00 and lim Nt = +oo (a.s). (6) 

j—>-+oo T—>-+oo 


a) Nt is distributed according to a negative binomial distribution, denoted asMB{N q, e 
Its probability mass function is then 

P(JVt = «)= (7) 

for n > Nq. When Nq = 1, Nt has a geometric distribution 

P [Nt = n) = e"^'^ (l - e-^'^Y ^. (8) 

Consequently, we have 

E[NT]=NQe^^. (9) 
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in) When Nq = 1: 

When Nq> 1, we have: 


E 


Nx 


RTe 


-RT 


1 _ g RT 


E 


1 

Nx 


1 — e 


-RT ^\ ( ^\k„kRT 


k=l 


iv) Furthermore, when Nq > 1, we have 


^-RT 


No 


< E 


Nx 


k 


^-RT 


k 


( 10 ) 


( 11 ) 


< 


iVo-1 


( 12 ) 


The proof of Proposition is presented in Section 4. 


2.2 Influence of age 

In this section, we study the aging effect via the mean age which is dehned as follows. 
Definition 2. The mean age of the cell population up to time t G M+ is defined by: 

Nt 


^ i=l 


{ZtJ) 




(13) 


where f{x) = x. 


Following the work of Bansaye et ah [5], we note that the long time behavior of 
the mean age is related to the law of an auxiliary process Y started at Fq = ^ with 
infinitesimal generator characterized for all / G C^’^(M+,]R) by 


Af{x) = af'{x) + 2R {fi'yx) - f{x)) h{'y)d'y. 


(14) 


The empirical distribution ^ ^xi gives the law of the path of a particle chosen 
at random at time t. Heuristically, the distribution of Y restricted to [0, t] approximates 
this distribution. Hence, this explains the coefficient 2 which is a size-biased phenomenon, 
i.e. when one chooses a cell in the population at time t, a cell belonging to a branch with 
more descendants is more likely to be chosen. 


Lemma 1. Let Y be the auxiliary process with infinitesimal generator (14), for t G 

rt 


Y,= iYo-Z]e-^^ 


R 


+ H+ / 

R Jo 


(15) 


where Ut is a square-integrable martingale. 
Consequently, we have 




and 


a 


limElTd = -. 

t^oo ^ R 


(16) 

(17) 
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We will show that the auxiliary process V satisfies ergodic properties (see Section 4.3) 
which entails the following theorem. 


Theorem 1. Assume that there exists h>0 such that for all 7 G (0,1), h['^) > h. Then 


a 


lim Xt = lim E(Yt) = 

t^+cxD t^+oo a 


(18) 


Theorem is a consequence of the ergodic properties of Y, of Theorem 4.2 in Bansaye 
et al. [5] and of LemmaIt shows that the average of the mean age tends to the constant 
a/R when the time t is large. Simulations in Section 3 illustrate the results. The proofs 
of Lemma and Theorem are presented in Section 4.2 and Section 4.3 


Remark 1. When the population is large, we are interested in studying the asymptotic 
behavior of the random point measure. As in Doumic et al. a we can show that our 
stochastic model is approximated by a growth-fragmentation partial differential equation. 
This problem is a work in progress. 


2.3 Estimation of the division kernel 


Data and construction of the estimator 

Suppose that we observe the evolution of the cell population in a given time interval [0, T]. 
At the division time let us denote ji the individual who splits into two daughters 
and Xf^ and define 


^i = 


Xf 

xp 

Or 


and 


1 

■pi _ h-i 


the random fractions that go into the daughter cells, with the convention g = 0 . 

r? and rl are exchangeable with T^ + T)^ = 1, T^ and T)^ are thus not independent 
but the couples (T?, are independent and identically distributed with distribution 

(ro, pi) where T^ ~ H{d-f) and T^ = 1 - Tb 

Since /i is a density function, it is natural to use a kernel method. We define an 
estimator h^ of h based on the data (T?, r))jgi!!j* as follows. 

Definition 3. Let K : M — > M is an integrable function such that 


K{x)dx = 1 and / K‘^{x)dx < 00 . 


Let Mt be the random number of divisions in the time interval [0, T] and assume that 
Mt > 0. For all 7 G (0,1), define 


Mt 


(19) 


2=1 


where K( = jKfjt), i > 0 is the bandwidth to be chosen. 


Remark 2. Since Nq 7 ^ 0, the number of random divisions is not equal to the number 
of individuals living at time T. Indeed, we have Mt = Nt — Nq. 
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In (19), hi depends also on T. However, we omit T for the sake of notation. The 
estimator hi will satisfy the following properties. 


Proposition 2. 


i) 


The conditional expectation and conditional variance given Mt of hi{'y) and variance 
hi{'y) are: 


E[hii-f)\MT] 

Yar[hi{'y)\MT] 

Yar [hi{j)] 


Ki-khij) and Elhiij)] =Ki-kh{'j), 


Mx 


-Yar 


Ki{^-A) 


E[^]Var [Ki{^-T\) 


Consequently, we have E[hi{'y)\MT] = E[/2,£(7)] . 
a) For all 7 G (0,1), 

lim hi{'y) = Ki* h{'y) (a.s). 

T^+oo 


( 20 ) 

( 21 ) 

( 22 ) 


(23) 


Adaptive estimation of h by Goldenshluger and Lepski’s (GL) method 

Let hi be the kernel estimator of h as in Definition We measure the performance of hi 
via its L^-loss i.e the average distance between hi and h. The objective is to find a 
bandwidth which minimizes this L^-loss. Since Mt is random, we first study the L^-loss 
conditionally to Mt- 

Proposition 3. The E'^-loss of hi given Mt satisfies : 


E 


fhi-hhlMT] < \\h-Ki^h\\2 + 


\K\ 


y/Mrl' 


(24) 


In the right hand side of the risk decomposition (24) the first term is a bias term. 
Hence it decreases when £ —>■ 0 whereas the second term which is a variance term increases 
when £ —7- 0. The best choice of i should minimize this bias-variance trade-off. Thus, from 
a finite family of bandwidths H, the best bandwidth i would be 


<-:=argmin{||ft-A',*ft|b + ^}, 


(25) 


The bandwidth I is called ’’the oracle bandwidth” since it depends on h which is 
unknown and then it cannot be used in practice. Since the oracle bandwidth minimizes a 
bias variance trade-off, we need to find an estimation for the bias-variance decomposition 
of hi- Goldenshluger and Lepski m developed a fully data-driven bandwidth selection 
method (GL method). The main idea of this method is based on an estimate of the 
bias term by looking at several estimators. In a similar fashion, Doumic et al. dl and 
Reynaud-Bouret et al. [3T] have used this method. To apply the GL method, we set for 
any i, £' £ H: 

Mt 

2=1 

Finally, the adaptive bandwidth and the estimator of h are selected as follows: 
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Definition 4. Given e > 0 and setting x := (1 + e)(l + we define 


I := argmin 
l&H 



+ 


xWKh } 

y/Mfilr 


where, for any i € H, 


A{i) := sup I 

i'&H ' 


11 ' 


— he' 


x\\Kh \ 


Then, the estimator h is given by 



(26) 


(27) 


(28) 


An inspection of the proof of Theorem 2 shows 
for the bias \\h — Ke*h \\2 up to the term ||-fC||i (see 
section 4). Since A{£) depends only on he/' and 
in practice. 


that the term A(i) provides a control 
(45) and (47) in the proof of Theorem 
he', the estimator h can be computed 


We shall now state an oracle inequality which highlights the bias-variance decomposi¬ 


tion of the MISE of h. We recall that the MISE of h is the quantity E 


\\h-hg 


Theorem 2. Let T > 0 and assume that observations are taken on [0,T]. Let Nq be the 
number of mother cells at the beginning of divisions and Mt is the random number of 
divisions in [0, T]. Consider H a countable subset of {A“^ : A = 1,..., Amax} in which 
we choose the bandwidths and Amax = [SMt\ for some 5 > 0. Assume h G L°°([0,1]) and 
let h he a kernel estimator defined with the kernel where (. is chosen by the GL method. 
Define 


eiT)-^ 


' g-RT+log(RT) 

< l-e-«^ ’ 


ifNo = l, 


(29) 


,-RT 


if No > 1. 


For large T, the main term in g{T) is e in any case. It is exactly the order of q{T) 
for No > 1. Then, given e > 0 


E 


\\h- 


< Cl inf 
eeH 


\\Ki^h- 


li + 


\K 


q{T)-H+C2q{T) 


-1 


(30) 


where Ci is a constant depending on No, ||Ar||i and e and C 2 is a constant depending on 
Nq, 5, €, \\K\\i, \\K \\2 and ||/i||oo. 


The term \\Ki-kh — /i||| is an approximation term, g(T)~^ is a variance term and 
the last term g{T)~^ is asymptotically negligible. Hence the right hand side of the oracle 
inequality corresponds to a bias variance trade-off. 

We now establish upper and lower bounds for the MISE. The lower bound is obtained 
by perturbation methods (Theorem and is valid for any estimator hT of h, thus indi¬ 
cating the optimal convergence rate. The upper bound is obtained in Theorem thanks 
to the key oracle inequality of Theorem 

For the rate of convergence, it is necessary to assume that the density h and the kernel 
function K satisfy some regularity conditions introduced in the following definitions. 
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Definition 5. Let /3 > 0 and L > 0. The Holder class of smoothness /3 and radius L is 
defined by 

L) = |/ : / has k = [fi\ derivatives and Vx, y G M 
|/W(y)-/(")(x)| 

Definition 6. Let fi* > 0. An integrable function K is a kernel of order (3* if 


• f K{x)dx = 1, 

• / |x|^*|i^(x)|(ix < cx), 

• For k = [l3*\, VI < j < A:, / x^K{x)dx = 0. 


Then, the following theorem gives the rate of convergence of the adaptive estimator h. 

Theorem 3. Let /3* > 0 and K be a kernel of order fi*. Let fi G (0,/3*). Let i be the 
adaptive bandwidth defined in (26). Then, for any T > 0, the kernel estimator h satisfies 

(31) 


2/3 


sup E\\h - h \\2 < C 3 q{T) 2/3+1, 
h&n(g,L) 


where q(T) ^ is defined in (29) and C 3 is a constant depending on Nq, 5, e, ||iV||i, 
loo, fi and L. 


\K\ 


We now establish a lower bound in Theorem HI 

Theorem 4. For any T > 0, fi > 0 and L > 0. Assume that h G 'H{(3,L), then there 
exists a constant Ci> such that for any estimator hT of h 

sup E||hr - h ||2 > C 4 exp RT^ ■ (32) 

heH[h,L) V 2/3 + i J 


Contrary to the classical cases of nonparametric estimation {e.g. Tsybakov |36j . ...), 
the number of observations Mt is a random variable that converges to +00 when T —)• +00 
which is one of the main difficulty here. From Theorem when Nq > 1 the upper bound 
is in exp 20 + 1 ^'^) which is the same rate as the lower bound. The rate of convergence 


h is thus optimal. When Nq = 1, the upper bound is in exp (^ 20 +i ( “ log(i2T))^ 

that differs with a logarithmic from the rate in the lower bound. The rate of convergence 
is thus slightly slower than in the case A^o > 1 and our estimator is optimal up to a loga¬ 
rithmic factor. Furthermore, Theorem 3 illustrates adaptive properties of our procedure: 

2/3 

it achieves the rate q{T) 2^+1 over the Holder class 'H{(3,L) as soon as fi is smaller than 
fi*. So, it automatically adapts to the unknown smoothness of the signal to estimate. 


3 Numerical simulations 

3.1 Numerical computation of h 

We use the R software to implement simulations with two original distributions of division 
kernel h and compare with their estimators. On the interval [0,1], the first distribution to 
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test is Beta(2,2). Beta(a,6) distributions on [0,1] are characterized by their densities 


^(1 — x )^ ^ 

. 

where B{a,b) is the renormalization constant. 

Since h is symmetric, we only consider the distributions with a = b. Generally, asym¬ 
metric divisions correspond to a < 1 and symmetric divisions with kernels concentrated 
around ^ correspond to a > 1. The smaller the parameter a, the more asymmetric the 
divisions. For the second density, we choose a Beta mixture distribution as 

- Beta(2,6) + - Beta(6, 2). 

This choice gives us a bimodal density corresponding to very asymmetric divisions. 




(a) (b) 

Figure 2: (a): MISE’s as a function of e. (b): (- — (-oracle o,s a function of e. The dotted 
lines indicate the optimal value of e which is used in all simulations. 


We estimate h by using and we take the classical Gaussian kernel K{x) = 
(27r)“^/^ exp(—x^/2). For the choice of bandwidth, we apply the GL method with the 
family H = |l,2“^,..., for some 5 > 0 small enough when Mt is large to 

reduce the time of numerical simulation. We have = 1, ||^i "||2 = and 

Kg-kKp = K hence it is not difhcnlt to calculate in practice hg^gf as well as hg/. 
Finally, the value of e in x = (1 + £)(1 + is chosen to find an optimal value of 

the MISE. To do this, we implement a preliminary simulation to calibrate e in which we 
choose e > — 1 to ensure that 1 + e > 0. We compute the MISE and ( — ^oracle as functions 
of e where (oracle = argmin^gj:^ E[||/if — /iH^] and h is the density of Beta(2,2). In Figure 
2^ simulation results show that the risk has minimum value at e = —0.68. This value is 


not justified from a theoretical point of view. The theoretical choice e > 0 (see Theorem 
2) does not give bad results but this choice is too conservative for non-asymptotic prac¬ 
tical purposes as often met in the literature (see Bertin et al. [9] for more details about 
the GL methodology). Moreover, following the discussion in Lacour and Massart [23] we 
investigate (see Figure 2b) the difference ( — ^oracle and observe some explosions close to 
e = —0.68. Consequently, we choose e = —0.68 for all following simulations. 
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Figure illustrates a reconstruction for the density of Beta(2, 2) and beta mixture 
^ Beta(2, 6) + ^ Beta(6, 2) when T = 13. We choose here the division rate and the growth 
rate R = 0.5 and a = 0.35 respectively. We compare the estimated densities when using 
the GL bandwidth with those estimated with the oracle bandwidth. The oracle bandwidth 
is found by assuming that we know the true density. Moreover, the GL estimators are 
compared with estimators using the cross-validation (CV) method and the rule of thumb 
(RoT). The CV bandwidth is defined as follows: 

lev = argmin \ / ^ hi_i{T]) > 

ten yj 

where he^-i{'y) = ~ 7)- RoT bandwidth can be calculated simply 

by using the formula £roT = 1.06dn“^/^ where d is the standard deviation of the sample 
(r|,... ,r^). More details about these methods can be found in Section 3.4 of Silverman 
[32] or Tsybakov [36] . 




(a) Reconstruction of Beta(2, 2) 


(b) Reconstruction of beta mixture 


Figure 3: Reconstruction of division kernels with T = 13. 


To estimate the MISE, we implement Monte-Carlo simulations with respect to T = 
13,17 and 20. The number of repetitions for each simulation is A4 = 100. Then, we 
compute the mean of relative error e = (1/A4) 6* the standard deviation ae = 

where 


(0 _ 


e,: = 


hh 


i = 1,... ,A4, 


(33) 


and denotes the estimator of h corresponding to repetition. 

The MISE’s are computed for estimated densities using the GL bandwidth, the oracle 
bandwidth, the CV bandwidth and the RoT bandwidth. Eor a further comparison, in 
the reconstruction of Beta(2,2), we compute the relative error in a parametric setting 
by comparing the true density h with the density of Beta(a, d) where a is a Maximum 
Likelihood (ML) estimator o. The simulation results are displayed in Table and Table 
Eor the density of Beta mixture, we only compute the error with T = 13 and T = 17. 
The boxplot in Eigure 4 illustrates the MISE’s in Table when T = 17. 
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GL 

Oracle 

GV 

RoT 

ML method 

T = 

13 

e 

0.1001 

0.0840 

0.1009 

0.0900 

0.0610 



O-e 

0.0585 

0.0481 

0.0599 

0.0577 

0.0724 



i 

0.0920 

0.0845 

0.0824 

0.0727 


T = 

17 

e 

0.0458 

0.0397 

0.0459 

0.0405 

0.0166 



<7e 

0.0260 

0.0230 

0.0297 

0.0237 

0.0171 



i 

0.0485 

0.0497 

0.0478 

0.0470 


T = 

20 

e 

0.0261 

0.0241 

0.0262 

0.0245 

0.0088 



<7e 

0.0140 

0.0114 

0.0132 

0.00121 

0.0091 



1 

0.0377 

0.0359 

0.0345 

0.0354 



Table 1: Mean of relative error and its standard deviation for the reconstruction of 
Beta(2,2). f is the average of bandwidths for M = 100 samples. 





GL 

Oracle 

CV 

RoT 

T = 

13 

e 

0.1361 

0.1245 

0.1379 

0.1686 




0.0672 

0.0562 

0.0815 

0.0537 



i 

0.0618 

0.0527 

0.0522 

0.0948 

T = 

17 

e 

0.0539 

0.0534 

0.0550 

0.0919 



CTe 

0.0180 

0.0168 

0.0168 

0.00223 



i 

0.0309 

0.0272 

0.0264 

0.0590 


Table 2: Mean of relative error and its standard deviation for the reconstruction of beta 
mixture ^ Beta(2,6) + ^ Beta(6, 2). 

From Tables and we can note that the accuracy of the estimation of Beta(2, 2) 
and Beta mixture by the GL bandwidth increases for larger T. In Figure we illustrate 
on a log-log scale the mean relative error and the rate of convergence versus time T. This 
shows that the error is close to the exponential rate predicted by the theory. Moreover, 
we can observe that the errors of Beta mixture are larger than those of Beta(2, 2) with the 
same T due to the complexity of its density. In both cases, the error estimated by using 
oracle bandwidth is always smaller. The GL error is slightly smaller than the CV error. 
The RoT error can show very good behavior but lacks of stability. Overall, we conclude 
that the GL method has a good behavior when compared to the cross validation method 
and rule-of-thumb. As usual, we also see that the ML errors are quite smaller than those 
of nonparametric approach but the magnitude of the mean e remains similar. 
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GL Oracle CV RoT ML 

Figure 4: Errors of estimated densities o/Beta(2,2) when T = 17. 


Figure 5: The 
log-rate (solid 



log-mean relative error for the reconstruction o/Beta(2,2) compared to the 
line) computed with 13 = 1. 


Since h is symmetric on [0,1] with respect to the estimator h can be improved and 
we can introduce 

(x) + h{l — x) 


h{x) = 2 


which is symmetric by construction and satishes also (31). We compute the mean of 
relative error for the estimator h with the estimation of Beta(2, 2) and Beta mixture. The 
results are displayed in Table Compared with the error in Table and one can see as 
expected that the errors for the reconstruction of h are smaller. However, these errors are 
of the same order, indicating that the estimator h had already good symmetric properties. 





GL 

Oracle 

CV 

RoT 

Beta(2, 2) 

T = 

13 

0.0785 

0.0634 

0.0762 

0.0644 


T = 

17 

0.0356 

0.0309 

0.0356 

0.0309 

Beta mixture 

T = 

13 

0.1117 

0.0953 

0.1030 

0.1584 


T = 

17 

0.0450 

0.0414 

0.0417 

0.0893 


Table 3: Mean of relative error for the reconstruction of h. 
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3.2 Influence of the distribution on the mean age 


For i > 0, recall the mean age defined in (13). To study the influence of the distribution 
on the mean age, we simulate n = 50 trees with respect to t = 6,6 + At ,..., 24 with 
At = 0.36. For each sample ..., we compute the average mean, the 1®* (Q 25 ) 

quartile and 3^^ (Q 75 ) quartile. Figure]^ and[^ show the simulation results corresponding 
to the density of Beta(2,2) with a = 0.45 and R = 0.4. One can see that the average 
of mean age and the mean age converge to ^ = 1.125 for larger t. This agrees with the 


theoretical result proved in Section 2.2 





Figure 6: (a) Average mean, 1^* andy^ quartiles for the sample of means for 50 trees, (b) 
Average mean, 1®* and 3'"'^ quartiles for one tree, (c) Average of Q-j^ — Q 25 with a G [0,2] 
at t = 12. (d) Mean age with a G [0,2] at t = 12. 


Moreover, Q 25 and Q 75 vary when the parameter a changes. In Figure [^, we draw a 
fitted curve of the average of (<575 — Q 25 ) when a varies from 0 to 2. As we mentioned in 
the introduction, if divisions are more asymmetric corresponding to small values of a, the 
toxicities concentrate on few cells, i.e. we have more older cells after the divisions. This 
explains the decreasing trend in the average of {Q 75 — Q 25 )- Finally, Figure]^ displays 
the average of mean ages with respect to a. One can note that it does not change when 
we replace the kernel distribution, e.g Beta(0.6,0.6) instead of Beta(2,2). 


4 Proofs 

4.1 Proof of Proposition 

a) The proof of ii) can be found easily in literature. Here we refer to m, Section 5.3 for 
this proof. 

i) Let us prove that lim^^+oo = hmj ^+00 = + 00 . Since our model has only births 

and no death, (A'i)tg[o,T] is a non-decreasing process: Ntj = Nq+ j. All the Tj’s are finite 

and limj_,._|_oo = +00 a.s. From ii), we have E[Ar] = . Hence, we deduce from 

the estimate sup E[Ai] < +00 for all T > 0 that Tj —)• +00 a.s. Then we also have 
te[o,r] j ^+00 
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lim'r_^_i_oo Nt = +oo a.s. 

Hi) Let p = When A^o 


E 


1 

Nx 


1, Nt ~ Geom(p). Then we have 

oo ^ oo ^ 

E = ^) = Y1 - p^"~' 


n=l 


n=l 


p g(i-pr 


p 


i-p 


n=l 


n 


1 — p 


log(p). 


Replace p with e 
When A^o > 1; r-^ 


, we obtain (10). 

AfB{NQ,p). Hence, we have 



where f{x) = XlnSvo nin-No)^'^- differentiate /(x) by taking derivative under 

the sum. Then: 


A 

dp 


/(I -P) 



(1 


[l-p)No-l 

p^o 



p^°{l-p)^-^° 


1 

p 



No-l 


since the sum is 1 (we recognize the negative binomial). 
Hence, 


w^/(l -P) = - - 

dp p 




No-l 




k=l 


k J p 


ofc+l 


p 


Integrating equation (35) and notice that /(O) = 0, we get 

\k 


/(I -p) =(-i) 
=(-i) 





k p^ 


+ log - 
P 


(35) 


( 36 ) 


Combine (|M|),(l36|) and replace p with e we get (fTTl). 
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( 37 ) 


iv) We first prove the lower bound of (12). From (§, taking ft{x) = 1, we have 

Nt = Nq+ [ f l{i<7v,_}Q(ds,di,d7). 

Jo Je 


Applying Ito formula for jump processes (see [20], Theorem 5.1 on p.67) to (37), we obtain 


Nt No^Io + l iV._ 


1 


1 


'^{i<N,_}Q{ds,di,d'y) 


1 


1 


0 Jo JE 


^l{*<JV,_}Q(ds, di, dj). 


Hence, 


E 


1 

Nx 


= — -E 
A^o 


1 


-RN.ds 


1 r 

= — -R E 

A^o Jo 


/o NsiNs + 1) 

Since Ns> No, we have Therefore, (38) implies that 

IE - R [ E — ds. 

Nt\ No Jo [ W_ 

By comparison of E 


1 


N, + l 


ds. (38) 


(39) 


Njp 


with the solutions of the ODE ^u(T) = —Ru{T) with w(0) = 


l/No, we finally obtain 


E 


Nx 


> —e 
- No 


-RT 


For the upper bound, notice that E[j^] < E 


1 


Wt" —1 


for No > 1. Then we have 


E 


1 


-Nx — 1- 


+ O0 

'E 

n=No 

+ CXD 


1 / n — 1 


n — 1 V n — A^o 


p^o(^l_p-^n-No 


y __p^o(l-p)-^o 

^ (n-iVo)!(iVo-l)!^ ^ 


n=No 


p 


1 Z^_ (n-No)\(No-2y ^ 

{m — 1)! 


_ P_ 

'No- 

P 


No-y%{n-No)\{No-2) 


+ O0 


I E 


m=NQ — l 
,-RT 


{m - {No - l))liiNo - 1) - 1)1 


.pNo-l(^l _ p'^m-{No-l) 


No- I No-V 

by changing the index in the sum (m = n — 1) and by recognizing the negative binomial 
with parameter (A^o — l,p)- Hence, we conclude that for Aq > 1 


,,-RT 


No 


< E 


1 

nJ^ 


„-RT 


< 


No-I 


This ends the proof of Proposition 
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4.2 Proof of Lemma [T] 

By symmetry of h with respect to 1/2, we have: 


Yt = Yo + + 2i? ^ {jYs - Ys) h{-f)d^ ds + Ut 

= Yo + (^a- 2RYs 7h{-f)dj^ ds + Ut 


= Yo+ {a- RYs)ds + Ut. 

Jo 

where Ut is a square-integrable martingale. 

Let Yt = Yte^^, Yq = Yq. By ltd formula, we get 


a 


R 


Pi = Po + ^ ( - 1) + / e^^dU 




Replacing Yt by Yte^^, we obtain 


a 


r,= |v-„-^ 


) + ^ + 


We end the proof by taking the expectation and the limit as t —?• +oo of Yt to obtain (16) 
and p!?] ). 


4.3 Proof of Theorem [T] 

We will show that the process Y satisfies ergodicity and integrability assumptions in 
Bansaye et al. [5] (see (HI) - (H4), Section 4). More precisely: 


1. E \Yt] < +00 for all t > 0. 

2. There exists w < R and c > 0 such that E [P/] < ce^* for all t > 0. 


From (16) we note that E[lt] < +oo for all t > 0. 


(14) we have 


To prove the second point, from 


E[y/] = E 


Pn" + 


2aY, + 2R 


h{j)dj\ ds 


= To + 2a / E[Ys]ds - 2dR / E[Y;]ds, 

Jo Jo 


(40) 


with 9 = Jq (1 — 'y‘^)h{'j)d'y and 0 < 0 < 1. 

Substituting E[yt] = (Yq — a/R)e~^^ + a/R into (40), we see that E(l/^) solves the 
following equation: 


dt 


-29RE[Yt^] + 




e 


—Rt 



(41) 
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The solution of the equation (41) is: 


-2em 


E[Y,^] = e 


Hence, if 9 = h, we have 


^0 + / e 
Jo 


26Rs 




EK^] = + (2ayo - ^ 

<FoV^* + (2ayo-"|^)e-(«-^)* + ^ 

<(yq^ + 2aYo + ^ + = cie^*, 


(42) 


with zu = 0 V (9 — R) := max(0, 9 — R). 

If Mi 


E[y/] = Y^ + (2ayo - 

R Jo R Jo 


_ T^2-2eRt 

— iq e 


+ (2ayo- —) 


1 


R ^ [29-l)R 


^-Rt _ ^-2eRt 


« / -, 


-29 Rt 


2 a^ 


< yoM (2ayo + —) 


+ 


a 


R ^\29-l\R 9R^ 


= C2. 


Thus, if we set c = max(ci, C 2 ) then E [y/] < ce“* for all t > 0. 

The infinitesimal generator ^ of y is defined for test functions as 

Af{x) = af(x) + 2R [ (/(qx) - f{x)) h{j)d-f. 

Jo 


For V{x) = X and f{x) = x + 1, we have 


AV{x) = a-Rx< -^f{x) + ("+ f 

Hence, by Theorem 5.3 of Meyn and Tweedie [26], there exists tt G Af_p(M+) such that 
limt_j,+oo E'[yi] = (vr, /) = Finally, applying Theorem 4.2 of |S], we obtain the result 

i^+oo Nt ^ ^ R 




4.4 Proof of Proposition 


To prove (20), let us remark that the number of random divisions Mt is independent of 


(Fl)jgpj*, because the division rate R is constant and because of the construction of our 
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stochastic process. Therefore, we have 


1 




=e[—J] i^,( 7 -r|)|MT 


2 = 1 


MrE[j^^(7-ri)] 

Mx 


= E[K,(7-r})] =i^,*/i(7), 


and E[/i£] = E E[/i£|Mr] = K£-kh{j). By similar calculations as (20), we obtain (21) 

and 


^ Kii'y - Tj) ^ E[Ki{j - rj)] as n ^ +oo. 


To prove ii), by the Strong Law of Large Numbers, we have 

1 
n 


2 = 1 


From (|^, we have limT^+oo = +oo (a.s). Since Mt = Nt — Nq and Nq is 
deterministic, this yields 

Mt 

— j;iL,(7-ri) ^E[iL,(7-rl)] =K,*/i(7). 


Mx 


i=l 


This ends the proof of Proposition 

4.5 Proof of Proposition 

We have 

E[||/i,-/i||2|Mt] < ||h-iL,*/i||2+E[||/i^-E[h,]||2|Mr] 
For the variance term, using that E[/i£( 7 )] = K^h£{'y)\Mx] 

E[||/i,-E[/i^]||i|MT] =e[ [ |/i^(7)-E[/i^(7)]|"d7|MT 

L JM 


= / Yar 

Jr 
1 

Mx 

< — / E 
Mx Jr 


hi{'y)\Mx dj 


Yar 


Kei7-T\) 




iL|(7-Fl) d7 


By Fubini’s theorem, we get 


E 


— dy = / / Ki{'y — u)h{u)dud'y 
Jm. 

= f h{u) f f — u)d7 ] du 

Jr \Jr J 


= \\Ki\\l f h{u)du = 11:^. 


Then we have 


E[||d,-EH||2|MT] < 


M 

Mxl ■ 


(43) 


Hence, applying Cauchy-Schwarz’s inequality, we obtain (24). This ends the proof of 
Proposition 
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4.6 Proof of Theorem 

This proof is inspired by the proof of Doumic et al. M- However, our problem here 
is that the number of observations Mt is random. To overcome this difficulty, we work 
conditionally to Mt to get concentration inequalities. 

Hereafter, we refer / / to Jjj / and since the support of h is (0,1), we can write f h{'y)d'y 
instead of h{j)d'y. Recall that 


Then, for any i € H, we have 


where 


11^ — h\\2 < Hi + ^2 + Hs, 

Hi := + 

H 2 •= ~ hi\\2 < H(i) + 


Mt^ 

x\\K\\2 

^MTi' 


H 3 := W-h\\2. 


By definition of we have 


and 


Hi + H 2 < 2A{1) + 2 


x\\Kh^ 

y/Mr^ 


A{tj < sup - IE[/i£,£/]) - [hp - E[V ])||2 

I'&H 

+ l|IE[/l£,£'] — E[/l£/]||2 — 


< 


Cr(^) + sup I \\&[hpp] -E[/i£/]|| 2 |, 
I'&H ^ 


x\\K\\2 

\/MtF 


where 


^r(^) = sup <j II [hpp - ^[hpp]) - [hp - E[hp]) II 2 - ^|=^| 


i'&H 


For the term sup < ||E[/i£^£/] — E[/i £/]||2 k we have 
e&H t ’ J 

E[hi^p] — E[hp] = j {^Ki-k Kp^{'-f — u)h{u)du — J Kp{^ — v)h{v)dv 
Ki{'y — u — t)Kp{t)h{u)dtdu — J Kp{'y — v)h{v)dv 
Ki{v — u)Kp{^ — v)h{u)dudv — J Kp{'y — v)h{v)dv 


= J Kp{'^ ~ ^ yy ~ u)h{u)du — h{v) ) dv 
= J Kp{'y — v) (^Ki-k h{v) — h{v)) dv. 


(44) 


(45) 

(46) 
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Hence, we derive 


mhe\ - II2 = \\Ke *{Ke*h- h)h < \\K\\i\\K, *h- hh, 


(47) 


where the right hand side does not depend on i' allowing us to take sup in the left hand 
side. 


Thus, (44), (46) and (47) give 


Ai + A 2 < + 2||i^||i ||i^£ -kh — h\\2 +2 


x\\Kh _ 


Then, 


E 


{Ai + A2f < 12E[^|.(£)] + l2\\K\\l\\Ki kh- h\\l + 12 


X^K\\ 


lE 


Mx 


For the term A^, we have from (43) 

E 


42 


= \\E[h,]-h\\i+E \\he-E[h,]\\i 


< \\K£ k h — / 1 II 2 + 


\K\\ 


lE 


Mx 


Finally, replacing y by (1 + e)(l + ||iF||i), we have for any £ G H 


E 


\\h-h\\l 
< 24E 


< 2E 


(^1 + A2f 


+ 2E 


42 


+2(l + 12\\Kf^\\Kikh-h\\l 
+ 2(1 +12(1+ e)2(l + ||iF||i)2)^ 


< 24E 


A Cl I IliF^ */i —/1II2 + 


lifll 


’E 


lE 


1 

Mx 

1 

Mx 


with Cl a constant depending on e and ||i^||i. 


It remains to deal with the term E [C|’(f’)] where ^t(^) is defined in (46), 

iriC) < sup - E[/if ^£/]||2 + ||/i£'- E[/i £/]||2 - 

I'&H L 


< sup <j||A,,-E[V]|| 2 ||i^||i + ||/^,^-EMIl 2 -^m} 


I'&H 


< sup 1(1 + ||iF||i)||/i£/- E[/i(>/]|| 2 - 
I'&H L 

< (1 + ||iir||i)S'r, 


Vm 

(l + e)(l + ||iF||i)||iF||2 


\/MtF 


where 


Hence, 


Sx ■= sup 
e&H 


-EHII 2 - 


(l + e)l|i^ll2 


E[#(£)] < {1 A \\K\\ifE\E[S^\Mx] 


(48) 


(49) 


(50) 
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If we show that 


then 


where C* is a constant. 


¥.\Sl\MT = n] <(7*-, 


Mx 


(51) 

(52) 


Let us establish (51). When Mt = n, Vn G N*, we set 

E[s2] =E[S^\MT = n] 


where 


with 


Then, 


^ (l + e)||i^||2 

:= sup \\Ze \\2 -- 

ieH I Vni 


Z, = h,- E[h,] = 


2 = 1 


E[s2] =e 


(l + e)||iL ||2 

sup -— 

i&H I vnt 


f+OO 


< 


(l + 6)||iL||2, ^ 

sup \\Zi \\2 - \ >X 

eeH I vnl 


< 


E 


n+OO 


|Z£||2 — ■-*’-?=—• y ^ X 


\fni 


dx 


dx. 


We bound this with Talagrand’s inequality. 


Let ^ be a countable dense subset of the unit ball of L2([0,1]). We express the norm 

I 112 9.S 

\\Zi \\2 = sup / a( 7 )Z£( 7 )d 7 
a&A J 

n » .. 

J a{jy (k,{^ - r^) - E[iL,(7 - r')]) ^7- 


= sup 
aeyl 


Let 


= f a(7)^ {K,i^ - L^) - E[iL,(7 - L^)]) dj. 


Then i = 1,... ,n is a sequence of i.i.d random variables with zero mean. Thus, we 
can apply Talagrand’s inequality (see [25l p. 170]) to ||.^£||2 = sup X)r=i ^T- 


aeA 


7],x > 0, one has 

IP > (1 + 7 )IE[||-^£|| 2 ] + V2i'x + c{r])bx^ < e~^, 

where c{r]) = 1/3 + r]~^, 


V = — sup E 
^ aeA 


I a(7) (iLK7 - r}) - E[iL,(7 - rj)]) dy 
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and 


b=- sup I -y)-¥.[Ki{'y-T\)]\d'y. 

^ y&(0,l),a&A J ^ ^ 

Next, we calculate the terms E[||Z£||2], v and b. Applying Cauchy - Schwarz’s inequality 
and using independence of variables, we get 

E[||Z,|| 2 ] < (e[\\z,\\ 1 ]Y^" 




Je ff;i^K7-r,')-E[i^,(7-ri)] j d7 


I r ^ [ 2' 

y ^E ^Kei^ - ri) - E[i^,(7 - Tj)]) J dj 


For the term i/, we have 


zz<—supE ( / — T\)d'j 

n aeA \J 


<-supE f |A:£(7 - rj)|(i7 X f a^(7)|A:£(7 - rj)|(i7 

^ a^A \_J J 

^jl^^supE f a^{'y)\Ke{'y-T\)\d'y 

a&A U 


- 7)1 <h 


< Ml sup fa^ 

n a&A J 


< -^ sup / / 0^(7)!iF£(7 — u)|/i(tt)du(i7 

^ a(^A J J 


l|/^l|oo||i^||? 


For the term b, we have 


b=- sup \\Ke{--y)-E[Ke{--r\)]\\2 
n j;e(o,i) 

- ^ (^7P,II^^X--!/)lb+ (E[/Alh-rl)rf7]) ' ) £ 


So, for all 7, X > 0 , we have 


P MIZsIb S + MU,'‘\\Kh^ + 2c(,) 


W^h^] ^ .-X 


< e-^. 
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Let be some strictly positive weights, we apply the previous inequality to x = W^+u 
for tt > 0. We have 



< e 


-Wi-u 


If we set 


then, 


VP, = (l + r,)'^ + WhWH^WKhJ'-^ +2cM 


\/ui 


WKhWe 

Vi ’ 


n 


F[\\Z,h-'i'i>\\h&K\\i^ + 2c{v) 


\\Khu 

nVi 


< e 


-Wt-u 


Let 

A = E 

An upper bound of A is given by 

A < 


sup (ll^^lb - 

r+oc 

= P 

sup {\\Zi\\2 - ^l)\ > X 

e&H ^ 

Jo 

e&H 


dx. 




+ 00 


{\\Zh-^e)l>x 


dx. 


Let us take u such that 


X = 


So, 

Hence, 


dx = 2f{u) 

_ /■+00 

V / 





+ 2c(r/)^^ ) du. 
nu ^'nViJ 


I 


,-Wt-u 


2/(u) 



1 


WKh ] 


+ 2c(r/) , 

nu nVi J 


du 


< 


< 


^ (\\h\\liVK\\nl^ + u-^du 

^Jo V V n nVi ) 

/■+00 

2Ve-'^^ / f{u)e-^u-^du 
e&H -^0 

\\Kf '•+“ 


— ^'n ^ ® 

e&H 

— ^ ® 
t&H 


-Wi 


-We 


l|/^lloo||A:||f 


+ O0 

2/ 11-112 


£2 


ue '^du ) X — 

/ n 


||t^|| 2 I ll-^lli 

oo||A||i + —^ 


1 

X —. 
n 


We need to choose Wi and r] such that 

e[s2] =e 






< A. 


(53) 


(54) 
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Let 0 > 0, we choose 


W^ = 


^ 11-^ II 2 




00 11-^ 111 


the we have 




\/ ‘Iri'sR, 11 ^ 11 00 11 11 1 

Obviously, the series in ( |53[ ) is finite and for any i £ H, since £ < 1, we have 

Vnt 


\K\\j nl 


< {l+ri + e + 


c{r))e^\\Kg 1 \ \\K\\2 


||-fi"|li Vni j y/nl 


Since H C {A A = 1,. .., A ma ^}, if we choose Amax = for some 6 
^min = A“ax we obtain 


|oo 11-^ 111 

It remains to choose r/ = e/2 and 9 small enough such that 


then 


and we get 


, civW^WKgVS _ e 

.lli^ll? 2’ 


\\K\ 


< (1 + e) 


\/nl ’ 


E 


< C* X 

n 


L J IL 

where C* is a constant depending on (5,e,||/i||oo,||-fi"||i and ||ill|| 2 - Hence, we gi 
Combining (49) and (52), we obtain 


E 


^"2 


\\h-h\\t\ <Ci ||iL,*h-/i||2 + L 


E 


Mx 


+ C*^,E 


Mx 


Moreover, since Nx > Nq, we have 


E 


Mx 


= E 


< E 


1 


= E 


Nx — Nq 

1 1 

1 _ ^0 
. A^o+l ^ 

1 

Nx 


Nx 1 
Nx — Nq Nx 


= E 


1 1 


< {Nq + 1)E 


Then, using (10), (12) and (55), recall the definition of q{T) ^ in (29) 
any i € H 


E 


\\h-hg] <Ci (||iL,*/r-/r||i + ^£.(r)-i] + C2^^(r) 


This ends the proof of Theorem 


> 0, then 


(52). 


(55) 
obtain for 
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4.7 Proof of Theorem |3] 


We begin with the bias term \\Ki * h — /i||2 in the right hand side of the oracle inequality 
). For any i ^ H and 7 G (0,1), let k = [/3J and b{'y) = Ki-kh{'^) — h{'y), then we have 


h{'y + u£) = h{'y) + h' {'y)u£ + • • • + 


(uiY 


(k-iyjo 

Since K is a kernel of order /3* and (3 G (0,^0*), we get 


+ eu()de. 


6(7) = j K{u) 


{u£)^ 


(1-0) 


k-l 


+ 9ui) — 


de 


du. 


Setting Ek^i{u) = \K{u) \ for the sake of notation. Since h G 'H{(3,L) and applying 

twice the generalized Minskowki’s inequality, we obtain 


\\h-E[h]\\l= / 62(7)07 


< 


< 


< 


< 


< 


du dj 


Ek,e{u) {I - + eu(.) - h^^\-i)\de 

J {I - + Oui) - h^^\-f)\de\ 07 


(1-0)^-^ (^J |0(^)(7 + 0'u£) -/iW(7)|^07^ ' , 


Ek,e{u) 

Ek,t{u) 

Ek/iu) 

\K{u)\- 


1 1/2 2 

du 


de 


du 


(1 - 0 )^- iL ( 0 u £)^-'=00 


du 


\uiy 


{k-i)\ 

< Ck,l,i3£‘^^, 

where Ck,l,i3 = (|t / |n|^|iir(ti)|0ti') . 


(1 - e)^-^L{uef-’^de 


du 


Finally, we have 
E 


\K\\ 


\\h - h\\l\ < Cimf CK,L,^f^ + ^0(T)-i + C20{T)-\ (56) 


Taking the derivative of the expression inside the inf of (56) with respect to £, we obtain 

i&H — 

the minimizer 


t = 


I TFlP 
l-f'- II2 


1 

2 / 3+1 


q{T) 2^+1. 


\2f3CK,Lfi ^ 

Since the optimal bandwidth £ is proportional to £* up to a multiplicative constant. There¬ 
fore, by substituting t' by £ in the right hand side of (56), we obtain 

E 


2/3 


\\h-h\\i <C^Q{T)-WTi 
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with C 3 a constant depending on Nq, 6, e, \\K II2, \\h\U 13 and L. This ends the 

proof of Theorem 


4.8 Proof of Theorem |4] 

For T > 0, let us denote by hx the estimator of h. To prove the Theorem]^ we apply the 
general reduction scheme proposed by Tsybakov [36] (Section 2.2, p.79). We will show the 
existence of a family T-Lm^T = {^i,T • 7 = 0 ) 1 ) ■ ■ •; fn] such that: 


1 ) hj^T e 'H(/3, L), j = 0 ,..., m. 

2 ) \\hj^T — ^A:,t ||2 > 2ce 2 / 3 + 1 ^'^^ 0 < j < k < m. 

3) — K{Pj, Pq) < ??log(m) for 0 < < 1/8. Pj and Pq are the distribution of 

observations when the division kernels are hj^x s-nd /iq, respectively. K{P, Q) denotes 
the Kullback-Leibler divergence between two measures P and Q: 


K{P,Q) 


/logfdP, ifP«(3 

+ 00 , otherwise. 


Under the preceding conditions 1, 2, 3, Tsybakov (321 (Theorem 2.5, p.99) show that 

inf max F (\\hx — h \\2 > > C', (57) 

hj, h£'Hm,T V / 


where the infimum is taken over all estimators hx and positive constant C is independent 
of T. This will be sufficient to obtain Theorem]^ by |36( Theorem 2.7]. The proof ends 
with proposing a family 'Hrn,T and checking the assumptions 1, 2, 3. 

Construction of the family 'Hrn,T'- 

The idea is construct a family of perturbations around /iq which is a symmetric density 
with respect to | and belongs to T-L{^,f3). For the simplification, we choose /io( 7 ) = 
l(o,i)(7)- 

Let Co > 0 be a real number, and let 7 G (0,1), f{'y) = LD~^g (Dj) where 5 is a 
regular function having support (0,1) and f g{'y)d'y = 0, g ^ 'H{^,(3), we define 

D = [coe73+i] and fkij) = / ^7 - ’ 

By definition, the functions fk’s have disjoint support and one can check that the functions 
fk€'H{^,(3). 

Then, the function hj^x will be chosen in 


V = 



D 


hoi^) + ci'^Skfkil) ■ S = { 61 ,..., 6d) G {0,1}'^ 


k=l 


where 



(58) 
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We now check that is a density, since f hs{'y)d^ = f ho(7)d'y = 1, it remains to 
verify that hs('y) > 0 V7. We have 

D 

infh 5 ( 7 )> inf ho - ||ci y^4/fc||oo 

( 0 , 1 ) ( 0 , 1 ) ^ 

> 1 — ciLD“h];naxsup \5k\g{D^ — {k — 1 )) 

k 'y 

> 1 - ciLD~^\\g\\oo > 0 , 

by the choice of ci. Thus the family of densities T) is well-defined. 

1) The condition hj^T £ Td{PiL)’- 

Let us denote q = [f3\, then for all 7,7' G ( 0 , 1 ) we have 

(7) - (7') = (7) - (7') + Cl ^ (7) - (7')) 


k=l 


D 


< 


k=l 


ci'^lSkl fk\i) - fk\i) 

< cimax|/^'^^( 7 ) - 
k I 

< ciLD~^m.axD'^ — {k — 1 )) — g^‘^\D^' — {k — 1 )) 

k 

< ciLD^^^~^ I 7 — ^'|h“LhJ < _ ^'|h-L/3J ^ 

which is always satisfied with ci = min j thus hs G 'H{L, 13). 


2) The condition \\hj^T — > 2 ce 2/3+1 

For all 6,5' G {0, l}-^, we have 


RT 



rl 

1/2 

In/ ^ \M 

II 

(M 

1 

/ (^ 5 ( 7 ) - hy(7)) ^7 
Jo 

= 

- 1 

1 

•4^ 

Wl 

__ 0 

_ 1 



„1 D 


1/2 

D 

Ctd 

Cl 

/ ^(8k-8 

'k?fl{l)dl 

= Cl 

Y.(^k-8'k) 

' L 1 


•^0 k=i 



k=l 

•^“D“ 


D 

k 



1/2 

Cl 

Y.^8k-8'kf 


{k - 1)) d'y 



k=l 

^ D 






n 

1 1/2 




1/2 


1 1/2 


= ciLZ)-h-V 2 ||^|| 


T.^Sk-6'k] 


k=l 


= ciLD 


-/3-1/2 


\\gh^dH{5,8'), 


where ^^(<5, 5') = J2h=i is the Hamming distance between 8 and 8'. 

According to the Lemma of Varshamov-Gilbert (cf. Tsybakov [36], p.l04), there exist 
a subset ..., of {0, l}*^ with cardinal ( [^ such that = (0 ,..., 0), 

m > 2^/®, (59) 
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and 


( 60 ) 


> -, VO < j < k < m. 

8 

Then, by setting hj^T{x) = h^(j){x), j = 0,... ,m, we obtain 
\\hj,T — hk^rh = ciLD~^~^\\g\\2\/ 

>^-^\\g\\,D-f^ 

whenever D > 8. 

Suppose that Nt > Nt* where T* = log Then, D > 8 and < 

(2co)^e2V+i^'^. This implies: 


||^i,T -/ifc.Tib >-^||5||2(2 co) 2/3+1 


RT 


But, 

Hence, we obtain 
where 


mm 


L\ 


, 1 < Cl < 1 


||^i,T — ^fe,r||2 > 2 c e 2/3+1 

c==<lTSM2i(2e„)-S 

8 


RT 


3) The condition ^ -^o) < i?log(m) for 0 < 1 ? < 1/8: 

We need to show that for all 6 G {0,1}^, 

K{Ps,Po)<^log{m), 

where 

K{Ps,Po)=E 




and where {Zt)t£[o,T] is defined in with the random measure Q having intensity 
q{ds,di,d'y) = Rhs{'y)ds n{di)d'y. 

Here, the difficulty comes from the fact that Nt is variable because the observations 
result from a stochastic process Zt- The law of these observations is not a probability 
distribution on a fixed M" where n would be the sample size, but rather a probability 
distribution on a path space. Ps is the probability distribution when the Poisson point 
measure Q has intensity Rhs{'y)dsn{di)d'y. Thus a natural tool is to use Girsanov’s 
theorem (see [ 2 T], Theorem 3.24, p. 159) saying that Ps is absolutely continuous with 
respect to Pq on Pt with 

dPs , _ 

dp^pT St, 

where (Si)te[o,T] is the unique solution of the following SDE (see Proposition 4.17 of 
for a similar SDE): 


= 1 + 


Ss-ll{i<Ar,_} 


{j^)-^)Q(ds,di,dj). 


(61) 
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Apply Ito formula for jump processes to (61), we get 


logDf.= / / 


/o J£ 

rT 


log Dt - 


log di, d'y) = ^ log 

^ 0 ( 7 ) ho{Tl) 


( hsil) _ \ 
\hoh) ) 


-1 Dt -logDt 


Q{ds, di, d'y) 


Nt 


hs{rj) 


10 J£ 


by definition of (r},.. 

Then, 

K{Ps,Po)=Es [logD^ 
= E [Nj'] E 5 


= 

log 


^ hPrP 


^1 


Here, E [Nt] does not depend on hs and we have E[A 7 ’] = . Thus, recall the 

definition of hs{-) and note that log(l + x) < x for x > —1, we get 




N{Ps,Po) = Noe^ / /i5(7)log(/i5(7))d7 
Jo 

D 


„1 D D 

Noe^'^ / '^og (1 + ci'^dkfki'y)^d'y 

k=i k=i 


D ^ A 

D 


Noe^^ ^ ^ (1 + cidkfkil)) log (1 + ci4/fc(7)) A 


k=l D 

D ,.1/D 


A^oA^y '4 / (1+01/(7)) log (1+01/(7))A 

k=i 


.l/D 


< Noe^^D / (1 + Ci/(7))ci/(7)d7 


< A'oo 


RT 


rl/D P/D 

ciLD~^ / g{D^)Dd'y + D~‘^^ / g^{D^)Dd'y 

Jo Jo 


<Noe^^c\L^D-^^ /'a( 7 )A 
Jo 

< iVoo?A||5llie«^Co'A-AfT«^ 

< NoL‘^\\g\\ 2 CQ^^~^D since ci < 1. 


From (59), we have m > 2^/® then 


D < 


8 log(m) 
log(2) 


Hence, if we set 


^ / 8 jVoA||g|l 2 
\ ''91og(2) 


2N 1/(2/3+1) 


we obtain K{Ps,Po) < i?log(m). This ends the proof of Theorem]^ 
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